Goto

Collaborating Authors

 distortion-rate function


Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models

Neural Information Processing Systems

We formalize the problem of prompt compression for large language models (LLMs) and present a framework to unify token-level prompt compression methods which create hard prompts for black-box models. We derive the distortion-rate function for this setup as a linear program, and provide an efficient algorithm to compute this fundamental limit via the dual of the linear program. Using the distortion-rate function as the baseline, we study the performance of existing compression schemes on a synthetic dataset consisting of prompts generated from a Markov chain, natural language queries, and their respective answers. Our empirical analysis demonstrates the criticality of query-aware prompt compression, where the compressor has knowledge of the downstream task/query for the black-box LLM. We show that there is a large gap between the performance of current prompt compression methods and the optimal strategy, and propose Adaptive QuerySelect, a query-aware, variable-rate adaptation of a prior work to close the gap. We extend our experiments to a small natural language dataset to further confirm our findings on our synthetic dataset.




Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models

Neural Information Processing Systems

We formalize the problem of prompt compression for large language models (LLMs) and present a framework to unify token-level prompt compression methods which create hard prompts for black-box models. We derive the distortion-rate function for this setup as a linear program, and provide an efficient algorithm to compute this fundamental limit via the dual of the linear program. Using the distortion-rate function as the baseline, we study the performance of existing compression schemes on a synthetic dataset consisting of prompts generated from a Markov chain, natural language queries, and their respective answers. Our empirical analysis demonstrates the criticality of query-aware prompt compression, where the compressor has knowledge of the downstream task/query for the black-box LLM. We show that there is a large gap between the performance of current prompt compression methods and the optimal strategy, and propose Adaptive QuerySelect, a query-aware, variable-rate adaptation of a prior work to close the gap.


Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models

Girish, Adway, Nagle, Alliot, Bondaschi, Marco, Gastpar, Michael, Makkuva, Ashok Vardhan, Kim, Hyeji

arXiv.org Artificial Intelligence

We formalize the problem of prompt compression for large language models (LLMs) and present a framework to unify token-level prompt compression methods which create hard prompts for black-box models. We derive the distortion-rate function for this setup as a linear program, and provide an efficient algorithm to compute this fundamental limit via the dual of the linear program. Using the distortion-rate function as the baseline, we study the performance of existing compression schemes on a synthetic dataset consisting of prompts generated from a Markov chain, natural language queries, and their respective answers. Our empirical analysis demonstrates the criticality of query-aware prompt compression, where the compressor has knowledge of the downstream task/query for the black-box LLM. We show that there is a large gap between the performance of current prompt compression methods and the optimal strategy, and propose a query-aware, variable-rate adaptation of a prior work to close the gap. We extend our experiments to a small natural language dataset to further confirm our findings on our synthetic dataset.


Semantic Communication of Learnable Concepts

Pase, Francesco, Kobus, Szymon, Gunduz, Deniz, Zorzi, Michele

arXiv.org Artificial Intelligence

We consider the problem of communicating a sequence of concepts, i.e., unknown and potentially stochastic maps, which can be observed only through examples, i.e., the mapping rules are unknown. The transmitter applies a learning algorithm to the available examples, and extracts knowledge from the data by optimizing a probability distribution over a set of models, i.e., known functions, which can better describe the observed data, and so potentially the underlying concepts. The transmitter then needs to communicate the learned models to a remote receiver through a rate-limited channel, to allow the receiver to decode the models that can describe the underlying sampled concepts as accurately as possible in their semantic space. After motivating our analysis, we propose the formal problem of communicating concepts, and provide its rate-distortion characterization, pointing out its connection with the concepts of empirical and strong coordination in a network. We also provide a bound for the distortion-rate function.


Interpreting Rate-Distortion of Variational Autoencoder and Using Model Uncertainty for Anomaly Detection

Park, Seonho, Adosoglou, George, Pardalos, Panos M.

arXiv.org Machine Learning

Building a scalable machine learning system for unsupervised anomaly detection via representation learning is highly desirable. One of the prevalent methods is using a reconstruction error from variational autoencoder (VAE) via maximizing the evidence lower bound. We revisit VAE from the perspective of information theory to provide some theoretical foundations on using the reconstruction error, and finally arrive at a simpler and more effective model for anomaly detection. In addition, to enhance the effectiveness of detecting anomalies, we incorporate a practical model uncertainty measure into the metric. We show empirically the competitive performance of our approach on benchmark datasets.


Gaussian Approximation of Quantization Error for Estimation from Compressed Data

Kipnis, Alon, Reeves, Galen

arXiv.org Machine Learning

We consider the distributional connection between the lossy compressed representation of a high-dimensional signal $X$ using a random spherical code and the observation of $X$ under an additive white Gaussian noise (AWGN). We show that the Wasserstein distance between a bitrate-$R$ compressed version of $X$ and its observation under an AWGN-channel of signal-to-noise ratio $2^{2R}-1$ is sub-linear in the problem dimension. We utilize this fact to connect the risk of an estimator based on an AWGN-corrupted version of $X$ to the risk attained by the same estimator when fed with its bitrate-$R$ quantized version. We demonstrate the usefulness of this connection by deriving various novel results for inference problems under compression constraints, including noisy source coding and limited-bitrate parameter estimation.


Lossy Compression via Sparse Linear Regression: Computationally Efficient Encoding and Decoding

Venkataramanan, Ramji, Sarkar, Tuhin, Tatikonda, Sekhar

arXiv.org Machine Learning

We propose computationally efficient encoders and decoders for lossy compression using a Sparse Regression Code. The codebook is defined by a design matrix and codewords are structured linear combinations of columns of this matrix. The proposed encoding algorithm sequentially chooses columns of the design matrix to successively approximate the source sequence. It is shown to achieve the optimal distortion-rate function for i.i.d Gaussian sources under the squared-error distortion criterion. For a given rate, the parameters of the design matrix can be varied to trade off distortion performance with encoding complexity. An example of such a trade-off as a function of the block length n is the following. With computational resource (space or time) per source sample of O((n/\log n)^2), for a fixed distortion-level above the Gaussian distortion-rate function, the probability of excess distortion decays exponentially in n. The Sparse Regression Code is robust in the following sense: for any ergodic source, the proposed encoder achieves the optimal distortion-rate function of an i.i.d Gaussian source with the same variance. Simulations show that the encoder has good empirical performance, especially at low and moderate rates.